Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples
نویسندگان
چکیده
It is now feasible to examine the composition and diversity of microbial communities (i.e., "microbiomes") that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology "Metastats" across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples.
منابع مشابه
Determination of 137Ba Isotope Abundances in Water Samples by Inductively Coupled Plasma-optical Emission Spectrometry Combined with Least-squares Support Vector Machine Regression
A simple and rapid method for the determination of 137Ba isotope abundances in water samples by inductively coupled plasma-optical emission spectrometry (ICP-OES) coupled with least-squares support vector machine regression (LS-SVM) is reported. By evaluation of emission lines of barium, it was found that the emission line at 493.408 nm provides the best results for the determination...
متن کاملPartial Differential Equations applied to Medical Image Segmentation
This paper presents an application of partial differential equations(PDEs) for the segmentation of abdominal and thoracic aortic in CTA datasets. An important challenge in reliably detecting aortic is the need to overcome problems associated with intensity inhomogeneities. Level sets are part of an important class of methods that utilize partial differential equations (PDEs) and have been exte...
متن کاملA robust least squares fuzzy regression model based on kernel function
In this paper, a new approach is presented to fit arobust fuzzy regression model based on some fuzzy quantities. Inthis approach, we first introduce a new distance between two fuzzynumbers using the kernel function, and then, based on the leastsquares method, the parameters of fuzzy regression model isestimated. The proposed approach has a suitable performance to<b...
متن کاملDetermination of Protein and Moisture in Fishmeal by Near-Infrared Reflectance Spectroscopy and Multivariate Regression Based on Partial Least Squares
The potential of Near Infrared Reflectance Spectroscopy (NIRS) as a fast method to predict the Crude Protein (CP) and Moisture (M) content in fishmeal by scanning spectra between 1000 and 2500 nm using multivariate regression technique based on Partial Least Squares (PLS) was evaluated. The coefficient of determination in calibration (R2C) and Standard Error of Calibra...
متن کاملSimultaneous Spectrophotometric Determination of Iron, Cobalt and Copper by Partial Least-Squares Calibration Method in Micellar Medium
Iron, cobalt and copper are metals, which appear together in many real samples, both natural and artificial. Recently a classical univariate micellar colorimetric method has been developed for determination of these metal ions. The organized molecular assemblies such as micelles are used in spectroscopic measurements due to their possible effects on the systems of interest. The ability of mi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 6 شماره
صفحات -
تاریخ انتشار 2015